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1. INTRODUCTION 

In recent years, there have been several cases of epidemics that have had serious consequences 
worldwide. In 2009 it came the pandemic due to influenza A (H1IN1) caused by a variant of Influenza virus 
A (HIN1) with genetic material [1] from an avian strain, two swine strains and a human who suffered 
a mutation and jumped between species pigs to humans, then allow spread from person to person. 
This epidemic officially began on June 11, 2009 and its ending was announced on August 10, 2010, leaving 
a total of 19000 victims. In December 2013, it started the Ebola epidemic in Guinea, and lasted until March 
29, 2016. This outbreak left a total of 11323 victims and 28646 contagions according to the World Health 
Organization with a rate of 70% mortality. 

In general, in all cases of epidemics, the immediate availability of information about where occurred 
the first infection, individuals that are infected [2], and people who have been in contact with infected people, 
can limit greatly the expansion of the epidemic. Obtaining these data is not an easy task since in most cases 
do not have that information immediately, and requires a slow and complicated process of investigation. 
However, the time factor is key in these cases to limit the spread and effects of the epidemic. 

There are several examples of the use of internet [3] and information systems [4] to help prevent and 
limit the effects of epidemics [5]. One of the most famous case was the monitoring and analysis 
of influenza and Dengue by Google [6]. However, there are other initiatives [7] such as FrontlineSMS [8], 
a communication system based on sending SMS that arriving in a data center that is responsible for spreading 
the message among groups of health experts and respond with an SMS. FrontlineSMS: medic is a subproject 
more specific of Frontline. Another example is Ushahidi [9]. It is an open source project that emerged as 
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a result of the earthquake in Haiti in 2007 [10]. It provides a system for collecting information through 
SMS [11], Web, voice messages and email [12]. It also has tools for translation, classification and 
georeferencing [13]. The information is presented in accessible maps via web or mobile phones [14]. 
The Asthmapolis [15] project focuses on asthma patients to track asthma attacks. They use an inhaler that has 
GPS with a mobile application that keeps track of the frequency of attacks. Using these data and 
the analyzing of environmental causes, it is generated risk maps of outbreaks [16]. The HealthMap [17] 
website and the app Outbreaks Near Me [18] use unofficial sources of information [19] on the network for 
tracking disease outbreaks [20] and perform real-time monitoring of possible epidemics [21]. Analyzes 
the information obtained and provides a unified view [22]. Flu Near You [23], a website created with 
the American Public Health Association and the Skoll Global Threats Fund of San Francisco, which allows 
individuals to send information about potential diseases. Weekly, they do reports about the health status. 
The last case is medic mobile [24] that offers several applications to register and track disease outbreaks 
faster, keep stock of essential medicines and communicate about emergencies [25]. 

This article describes a tool which offers a partial solution to the problem described. The application 
is not a medical protocol to answer against epidemics or an emergency plan. It is a system alerting to 
epidemics that uses the medical information and data obtained from the mobile phone (geographical data and 
information recovered from sensors of the mobile phone). In this sense, the goal of the application is 
the quick recovery of data about possible epidemics, performing a quick analysis of the data to locate 
the outbreaks of epidemics and possible areas of infection, and the transmission of results to all users so they 
know whether they are infected or they can avoid contagion areas. 

The article is structured as follows. In section 2, the objectives of the system are described. 
In section 3, it is described the problem of sources of information. In section 4, the architecture software is 
presented. In section 5, it is described the analysis of data performed by the system. Finally, conclusions and 
future work are proposed in section 6. 


2. RESEARCH METHOD 

The main problems of the application are to retrieve information about outbreaks, analyze 
the information and inform system users through alerts on the mobile phone so they can use the information 
obtained in order to prevent contagion. In order to do so, it requires that the system retrieves information 
from three different sources. The first source of information comes from the mobile phone of users on 
the system. This information is obtained from sensors that have built-in phone [26]. Essentially, it is 
information about geolocation with the goal of knowing at every moment where the individual is localized, 
the places that has visited and at what times. Other data such as brightness or temperature are also obtained. 

The second source of information is from medical experts. Experts that will use this application 
must enter data of all cases of people infected that they manage. Of all medical information about an infected 
patient will be used the following information: geographic location, type of disease, the symptoms, 
the patient's condition, information about places where the individual has been and when it has been in these 
places. In order to perform the processing of the information, the personal data of patients are not needed, 
so that they remain anonymous. In any case, every patient will be informed that their data will be used, 
so they can decide if they are willing to lend them or not. Additionally, it is used other medical data obtained 
from official sources that serve to complement the information collected directly by doctors. 

The third source of information is the unofficial information extracted directly from the Internet [27]. 
For this goal, it was chosen several information sources: Twitter, Facebook and several online newspapers. 
It performs a combined analysis of keywords and sentimental analysis [28] to retrieve information about 
possible outbreaks or evolution of an outbreak identified. In order to process information from all sources 
of information, it was developed a system consisting of a web client and an Android application, four 
databases (two relational and two non-relational) for storage of the data, and finally two RESTful APIs, that 
allow the mobile application and the web client to perform data counting and interaction with databases. 

All users of the system must install the Android app and its functionality is the recovery of data 
from sensors of mobile phones and sending them to the database that centralizes all the information gathered. 
In addition, through this app, users receive alerts as well as all the information generated by data analysis. 
The second application is a Web application that is used by doctors to enter information about infected 
patients that have been attended by the doctor. In addition, doctors use the web application in order to track 
the infected patients: next visits, evolution of the patient. In addition, the web application has a set 
of internal scripts that perform various functions of processing and information retrieval. On the one hand, 
there is a process that periodically connects to the sources of unofficial information previously commented 
(twitter, Facebook and several online newspapers), then retrieves information, and stores it in the database. 
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Also, there is another internal script that whenever new information is added to the system, then it 
processes all data to generate alerts to be sent to all system users. Alerts that are sent are of various types 
depending on the situation of each user. Those users who have not been in a focus of an epidemic or they are 
not in a contagion area, then receive an informational alert about where there was an outbreak and areas that 
should be avoided to not be exposed to possible infection. Nevertheless, users who could be infected receive 
the previous alerts described and they are informed of possible contagion. In addition, the system suggests 
them that they should go to a medical facility to validate whether they really have been infected or not. 
Also in the latter case, the script automatically adds the data of users that may be infected to the set of data 
of users that they are tracked. In this way, when the user visits the health center, the doctor already knows 
that this is a possible case of contagion. This action also is used to analyze other possible cases of indirect 
contagion that could have been generated from individuals who might be infected. 

Finally, in order to manage this information, it has been used two types of different databases. 
To store data coming from the sensors of mobile phones and data entered by doctors about patients who have 
attended, it has used a relational database of MySQL type. However, for data coming from the Internet, it has 
used a NoSQL database of MongoDB type [29]. The decision to use two different databases is due to 
the different nature of the data. In the first case, data uses a fixed structure, while data obtained from internet 
are data whose structure can change their structure every time that it is recovered. 


3. RESULT AND DISCUSSIONS 
3.1. The Android app 

The main function of the app is the retrieval of data from the sensors of mobile phones in order to 
analyze and to generate alerts about possible outbreaks, areas of infection or infection of the users. For this 
the app sends the location and the time or ‘timestamp’ of the users. The location is sent every 5 minutes 
or whenever a change of position of 100 meters is detected. In this way, it is avoided to saturate the sending 
of data and to keep a control of them and, above all, battery savings are achieved, an aspect to be taken into 
account in mobile devices. This information is stored in the MongoDB database. The main functionalities 
of the app are: 

a. Registration. The user must install the app on the mobile phone, and do the registration in the app to use 
the services of the system. It must enter: name, surname, ID number, date of birth and email. When 
the user accept, then the user receives an email confirmation. Once the user confirms, then the user can 
enter the password that it will use in the account. 

b. Login. In the authentication window, the user must enter the password and ID number, and then it must 
click on the link "Enter". In addition, ther is the option to create a new account. 

c. Main interface. When the user logs into the account reaches the main navigation page. On this page, 
the user can choose from several options: "My account", "Settings", "About", "Technical support" 
and "Logout". By "Settings", user can configure some options of the app: “Enable geolocation” (the app 
can collect the geolocation data of the mobile phone and send this data to the database), “Enable 
notifications” (the user can receive alerts and information that the system generates when analyzing 
the data), “Exclusive use with Wi-Fi” ( the app will send data in the case of there are a Wi-Fi connection 
available), “Language” (the user can change the language used in the user interface) and the last option 
allows users to enter Twitter and Facebook so that they can share the information generated by 
the application. The option “My account” implements the core functionality of the app. It has 3 options: 

— Check the current state of the user and the zone. On this page, the user can verify the situation in 
which he is with respect to a possible infection. The application shows if the user could be infected 
because he has been in contact with some people who might be infected. It also shows if in 
the area where it is located, there is an outbreak and, therefore, it is advisable to avoid the area. 
It can see two zones: the upper one indicates the current state of the user (if the health is in danger 
because the algorithm has detected it), while the lower one reports the status of the area in which the 
user is at that moment (if in the current area there is some risk of infection detected by 
the algorithm or if the area is free of diseases). When the screen is moved down, the current view will 
be updated, so that the user's current position will be taken again to see if there is any risk in their 
new location. 

— Check and delete news. On this page, the application will show all the news about the epidemics that 
generate the web application, the appearance of new infections or the extinction of them. News are 
common for all users. Each news is obtained from the information that the application retrieves on 
Twitter as well as on Facebook and online newspapers. In addition, in the upper part there are data of 
possible interest such as the municipality, the current temperature and the date. To update all the 
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information, just make the 'swipe’ gesture. Also, it is possible to delete a news item that it has been 
already reviewed or that they are not useful. 

— Consult profile. The user's profile page displays all user data and it can be changed the password 
of the user. 


3.2. The web application 

The web client will be the point of interaction with doctors or health specialists. Through this web, 
it possible to establish the starting point of a contagion (ie the user) and the date, so that the execution of 
the algorithm alerts the possible affected and allows the visualization of the evolution of a disease, such as 
the number affected, the areas and statistical data on these. In addition, possible sources can be searched from 
a list of users, to see where a contagion might have arisen. 

The web page is the tool that the doctor will tell day by day, he will consult it every day and it will 
be part of his work. All possible actions are aimed at the study of different diseases, infections and outbreaks 
using patient data, and show all this information once computed. The website communicates with the API, 
which, by consulting the system databases, returns all the information requested. The main interface as 
shown in Figure 1 is divided into several areas: information about contagions and a search engine of patients, 
a form for sending messages to patients and a bar of functions in the top side. In the top left, it appears 
the identification of the health center in which the application runs. In the upper right, it appears the user 
of the doctor who is using the application. 


<7) HOSPITAL UeVERSITARIO 12 OF OCTUGRE 





Tel CONTAGIOS FOCOS ESTADISTICAS @ SOCIAL 


ENFERMEDADES ACTIVAS BUSCADOR DE USUARIOS 








Francia 





Manuel Martinez Sanchez 


Google ee 


Figure 1. Main interface 


The area information about contagions show a list of all areas with epidemics. It is a real-time 
information. For each epidemic disease, it is indicated: the city or town where it is located the focus, 
the number of infections, the number of deaths that have occurred due to the epidemic, the level of danger 
and a color ranging from yellow (level 1), orange(level 2) and red(level 3). The color graphically depicts 
the danger of the epidemic. A hot map shows the areas where there is an epidemic If more detail is needed, 
the map allows zooming and brings several options can alternate between different views of the map: 
political or physical. 

Next to each row of information, there is a delete button that lets remove a focus, once it has 
disappeared. When pressed, the alert level goes to 0 and disappears from the list and heat map, showing 
an alert message informing of the recent change. Also, there is an update button that allows real time refresh 
the data displayed in case any of them has been modified during the observation period. When a user clicks 
on an area of the map will appear in a new window where are shown the streets and areas of the city which 
are dangerous (because there is an infected focus). 

In the upper right portion of the page, there is a search engine of patients that uses the patient's ID 
number. If it does not exist or if it is an incorrect identifier, a warning message is displayed. And, if it is 
correct, it retrieves as result the patient's report, showing personal data and observations about the disease. 
In addition, it can record whether a user has died because of a contagious disease. It also has a link to send 
an email to the patient. If the patient searched is sick, then it is shown all the diseases with a Heal button 
where the doctor can press to eliminate that disease of the patient in case of being cured. If all the diseases 
of the patient disappear, the condition of the patient is updated from sick to cure. 
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The bar of functions offers the next options: 

Contagions. This link leads to a page in which the doctor can register a new infected patient. It is a form 
where the doctor must insert: ID number, exposure time, and minimum distance to be a contagion, 
disease, discharge date, severity, description, and a parameter about the number of days. This last 
parameter set when it must perform a search of patients they could have been infected. For this, it is 
executed an algorithm to search for all users who meet the parameters entered (patients potentially 
being infected), and generating notifications for each of them. Also on this page the doctor can consult 
notifications have been sent in recent days as shown in Figure 2(a). There is a user search by identifier 
which allows to display all the notifications sent. So, it is shown a table with the identifier, the name, 
the illness for which it was notified and the date of the notification. The doctor can confirm 
the contagion after a physical examination or set a false alarm in case the patient does not present any disease. 
Sources. This link leads to a page in which the doctor can consult the active sources or register a new 
source. In the page Active sources as shown in Figure 2(b), there is a list with every source. In every 
row is shown: number of people affected, start date and description of the outbreak. Also, it is possible 
to cancel a focus. In the page Register a source allows to find the place of origin of a disease using 
a series of infected individuals. For this purpose, a form is available where it is possible to enter 
the identification number of an infected patient and add it to a list. All the users searched for are shown 
in a table. Next, it is executed a script that performs a triangulation with infected patients to find 
the focus of the epidemic. When the algorithm finishes crossing all the places and obtaining the results, 
a table with all of them is shown, as well as a map with a mark in the exact point where the focus of the 
disease may have been located. 
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Figure 2. (a) Notifications, (b) Active sources 


Statistics. This link provides access to a set of graphical representations of statistics that can be about 
general data or specific data on diseases. The general statistics shows a ranking of the top 5 most 
contagious diseases, diseases with higher mortality, and the status of all patients. With respect specific 
diseases, there is a search engine to retrieve information from a particular disease. In the first place, 
a table is shown with the number of people infected by that disease, the deaths that it has produced, 
and whether it is eradicated or not. Also, there are a two graphical representations that shown statistical 
data on infections by sex and age, as well as the number of deaths that have occurred due to these 
diseases. In all cases, it is possible to export all the data and generate a pdf report with all the charts and 
tables of the diseases. 

@Social. In this link, there is a heat map that shows the presence of diseases globally using the data 
obtained from the web with the data collector. Also, it is possible to query a specific area by name and 
date, and it will be shown a list of diseases in this are by date. There is a filter in order to show 
the active diseases in a concrete date. 


3.2. RESTful APIs 


There are 2 APIs that are the entry point for mobile applications and the website, making it possible 


to abstract their functionality and interact with databases to save and request data: 


a. 


The first API allows interacting with all data related to users registered in the system (diseases, 
notifications, news, infections and outbreaks). This API offers the following functionality: 
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b. Related to users: 1)User registration: given the name, surname, email, date of birth, gender, weight, 
identification document (DNI/NIE) and password, add the user to the database if there is no user with 
the same identification document or with the same email; 2) Edition of the state of the deceased user: in 
the event that a user dies, its status is changed to "deceased" and the number of deaths caused by 
the diseases to which the active contagions of said user belong is increased by one unit; 3) Obtaining 
a user's data according to identification document and password for checking when entering the mobile 
application; 4) Obtaining a user according to the identification document. If said user exists, 
their sociodemographic data and active contagions are obtained with the data of the disease to which 
they belong; 5) Obtaining the number of users that are in each state (undefined, healthy, cured, sick, 
and deceased). 

c. Related to diseases:1) Obtaining the list of diseases; 2) Obtaining the desired number of diseases with 
the greatest number of infected users; 3) Obtaining the desired amount of diseases with the highest 
number of deaths caused; 4) Obtaining the data of a disease according to the name; 5) Obtaining the list 
of users affected by a disease with their data on gender, age, and weight. 

d. Related to the contagions: 1) Given an initial user, a distance, a time window to the past, a time 
of exposure and a description, it is analyzed in search of the possible infected users adding a notification 
for those affected; 2) Given a user and a contagion, said user is added to the list of users belonging to 
said contagion. In addition, the data of the disease to which this infection belongs is updated: a unit is 
added to the number of people infected in the age range to which the user belongs, the average weight 
that the disease is affected is updated and a unit to the number of people infected of the same gender as 
the user; 3) Given a user and a contagion, said user is removed from the list of users belonging to said 
contagion. In addition, if the user does not have any more active contagions, their status is changed to 
"cured"; 4) Given a contagion, the status of said contagion is updated to "inactive"; 5) Obtaining the list 
of all active infections; 6) Obtaining the list of infections that belong to a disease, according to 
the name, and the list of users that belong to each of these infections; 7) Obtaining geographical points, 
latitude and longitude, where the active contagions are. 

e. Related to the outbreaks: 1) Given a list of user identifiers (DNI/NIE) and a disease, possible points are 
obtained where they could infect (foci) both all users of the list, as a small group of them. 
In addition, a focus is added to the given disease, and a place for each point found at said focus; 
2) Given a focus and a place, the given place is removed from the list of places of said focus; 
3) Obtaining the list of active bulbs with their respective places. 

f. Related to notifications: 1) Given the identification document of a user, all outstanding notifications are 
obtained, that is, the possible diseases that have been contracted by said user and detected by 
the system; 2) Given a user and a notification, it is verified that said user has attended the medical 
review regarding a possible contagion. If the user gives positive referring to the contagious disease, 
said user is added to the list of infected users (appropriately updating the data of the disease); 3) Given 
a user and a notification, the notification of the system is eliminated. In addition, the user's status is 
updated to "healthy" or "cured" as appropriate. 

g. And the second API allows interacting with data on diseases and geographical areas from social 
networks, newspapers and official websites. This API offers the following features: 1) Obtain the list 
of active diseases together with the place where they are present and the number of times they have 
been mentioned on twitter, newspapers and on the CDC website [30]; 2) Obtain the list of active 
diseases in a place and/or on a specific date, as before with the respective information on the mentions 
in the WWW; 3) Obtain the list of geographical locations and the weight of each point of the active 
diseases. As well as the respective centers of the spotlights; 4) Get the list of geographic locations and 
the weight of each point for a specific disease. As well as all the respective centers of the spotlights. 


3.3. Analysis of data 

In order to carry out an analysis of the data that are generated in social networks such as Twitter, 
on the websites of official organizations such as the CDC [30] and in online newspapers, several programs 
are used. Observe that these data are unstructured, so they need to be processed to retrieve the information 
contained in them. The main function of the data obtained from the Web is to check the presence of diseases 
in the different localities of the planet, and also measure the "concern" or "awareness" that the population 
of a geographical location has about a disease. 

First of all, there is a program that analyzes the health sections of "El Mundo", "Puiblico" and "20 
minutos". There is a program that runs once a day for each of them, analyzing the pages in search of news 
that mention any of the currently active diseases in Spanish or English. If it is found anything, then it is 
analyzed the text in search of geographical locations, language and feeling (positive, neutral or negative), 
to be able to give a more appropriate value to the content and the "concern" that reflects, updating with it 
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the corresponding diseases and the geographical points in which they are located. There is also another 
program that analyzes the alerts section of the CDC (Disease Control Center) page [30]. As with the news in 
the previous section, it is executed on a daily basis and updates the database with the information collected 
about diseases and their geographical disposition. To analyze the data produced in the social network Twitter, 
two programs are used that are in continuous operation. The first one is pending the tweets produced by 
official accounts, such as "sanidadgob" (corresponding to the Ministry of Health of the Spanish 

Government), ECDC_Outbreaks, CDCespanol and CDCemergency 

The second program analyzes in real time all the tweets produced that contain in their text any of 
the currently active diseases, in Spanish or English. For both cases, the text is analyzed in search 
of geographic locations, language and sentiment, and the database is updated with the corresponding diseases 
and their geographical points. Once the tweets or news have been obtained, the processing is as follows: 

— Get the language using the MonkeyLearn API. 

— According to the language get the feeling using the MonkeyLearn API. 

— Obtain the names of diseases mentioned in the news, according to the existing ones in the database. 

— Get the names of locations mentioned in the news, using the MonkeyLearn API. 

— Obtain all possible combinations of disease and location. If there were no location, all the locations 
of that disease would be obtained in the database and it would be treated as generic news. 

— If there is no entry in the MySQL database with this combination of disease and location, it is created; 
if it exists, its update date is updated, the weight and the alert level are updated. 

— If there is no entry in the MongoDB database with this combination of disease and location, it is 
created. If there is no center within 500 meters of said location, one is added. 

There is also another program that analyzes the alerts section of the CDC (Disease Control Center) 
page. This program analyzes the entries, which always follow the same format: level, date, illness (possible 
alternative name), place (possible specification). The text of each entry is analyzed in search of the previous 
fields and once obtained, the database is consulted. In the case of not existing, an entry is created; if it exists, 
the modification date is updated and a cdc is added to the entry. 

The system requires two types of calculations: 

a. Possible contagious people. Given an initial user who has been infected with a specific disease, we want 
to analyze its particular characteristics in order to calculate the values of 3 variables that will determine 
whether a person using the implemented system has been infected with a disease or not: exposure time, 
distance to the epidemic focus and temporary moment of infection. All the variables are dependent on 
the disease considered. In this sense, the first variable refers to the minimum time of exposure to 
a specific disease so that a person could become infected. Regarding the second variable, it refers to 
the minimum distance to the infection center so that a person could become infected. Finally, the third 
variable refers to the time in days when a disease begins to be contagious. The combination of these 3 
variables defines what is called a contagion window that represents the physical-temporal limits for 
a person to be considered as infected. To calculate the infection window, the input data is: an initial 
user, a time window to the past (which is the first day to be taken into account), a maximum distance 
and a minimum exposure time. The algorithm traverses the user's points in the specified time window 
and compares with other points in the same time window, with an error of + - 5 minutes (since that is 
the time interval between sending locations if the user moves), and compares the position. If both points 
are at a specified lower distance, all their points are compared. If the distance condition is also met at all 
points in the specified exposure time, that user is added to the list of infected users. 

b. Search of the origin of infection. Given a list of infected users, it is about finding the source 
of infection. To do this, a triangulation is performed using the positions in which a user has been in 
the past. From the positions the path that the user followed is reconstructed. From the paths of all users, 
the intersections between the considered users are searched. To calculate the infection focus, 
the algorithm obtains the paths of each infected user according to a list of geographical points 
(locations) in which it has been in the past. Next, it generates all the possible combinations, that is, 
it starts by combining users 2 to 2 until all the users of the list are considered. Once the combinations 
are generated, possible crossings are looked for (intersection of the straight lines that form the road). 
If an intersection is found, it is added to the list. The combinations are used because it is possible that 
users have been infected in different sources, because what a search for a point in common to all roads 
could give a false negative. 


4. CONCLUSION 
In recent years there have been several epidemics globally, which have shown that the time factor 
is key. The quickly to have data about infections or focus of the epidemic, directly influences the extent 
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and effects of the epidemic. This paper presents a system that allows using an Android app to collect data 
from sensors of the mobile phones, and also it is collected data of patients who visit medical centers with 
a web application. Thus when an infection is detected, a set of processes analyze data collected from mobile 
and patients. As result, it is generated alerts that are reported both users could be infected as those users that 
are not. Likewise, the system allows to track infected patients and sources of infection, and do statistics on 
the data collected. In addition to the data sources mentioned, the system also retrieves information from 
official and unofficial sources such as Twitter or Facebook. As future work we want to add new sources 
of information and the use of machine learning algorithms that can predict where an epidemic could occur 
through the analysis of information. 
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